Cache Awareness in Blocking Techniques

نویسندگان

  • O. Temam
  • C. Fricker
  • W. Jalby
چکیده

To date, data locality optimizing algorithms mostly aim at providing strategies for blocking and reordering loops. But little research has been devoted to the nal step: nding the optimal block size, i.e., a block size that provides the best possible performance. Optimal block sizes are currently computed as if a cache is a local memory, i.e., cache interferences are ignored. Case-studies have already shown that cache interferences can greatly aaect the optimal block size value. The purpose of this article is to show that analytical modeling of cache interferences can be used to compute near-optimal block sizes for blocked loop nests. First, the method for evaluating cache interferences is presented. Second, the model is validated by correlating the estimated miss ratio with the simulated miss ratio and the execution time of various loop nests. Then, current techniques for computing the optimal block size are analytically and experimentally shown to yield below-optimal performance. Finally, current block size computation techniques are augmented with analytical modeling of cache interferences and TLB misses, and this new technique is shown to yield near-optimal performance and make blocking techniques safe. Reciprocally, it is also shown that even when no capacity miss occurs, nely tuned blocking techniques can be used to signiicantly reduce the number of cache interferences.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Contents II Cache Awareness in Blocking Techniques 76 8

To date, data locality optimizing algorithms mostly aim at providing e cient strategies for blocking and reordering loops. But little research has been devoted to the nal step, i.e., computing the optimal block size. Optimal block sizes are currently computed as if a cache behaves as a local memory, i.e., cache interference phenomena are ignored. Case-studies have already shown that cache inter...

متن کامل

In-Core Optimization of High-Order Stencil Computations

In this paper, we apply in-core optimization techniques to high-order stencil computations, including: (1) cache blocking for efficient L2 cache use; (2) register blocking and data-level parallelism via single-instruction multipledata (SIMD) techniques to increase L1 cache efficiency; and (3) software prefetching techniques. Our generic approach is tested with a kernel extracted from a 6 th -or...

متن کامل

DRAFT: Polynomial Multiplication: Blocking to Improve Cache Performance

We search for techniques to decrease the multiplication time for large sparse polynomials in Lisp by speeding up the sequential accesses of large vectors. We do this by utilizing blocking to improve cache performance, which we show to be effective for sufficiently large problems.

متن کامل

Innuence of Cross-interferences on Blocked Loops: a Case Study with Matrix-vector Multiply

State-of-the art data locality optimizing algorithms are targeted for local memories rather than for cache memories. Recent work on cache interferences seems to indicate that these phenomena can severely aaect blocked algorithms cache performance. Because of cache connicts, it is not possible to know the precise gain brought by blocking. It is even diicult to determine for which problem sizes b...

متن کامل

Using Cache as a Local Memory

Inability to reuse data, conflicting references, and underutilization of cache capacity are responsible for poor cache performance on various commonly used applications. Data prefetching, blocking, and data copying have been used to address these problems. These techniques, though effective, are directed towards solving one aspect of the overall problem. We propose a comprehensive solution to t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998